Police Department Stop Data

Note

This document is a product of the final project for the STAT 570 lecture, focusing on data handling and visualization tools. It is essential to acknowledge that minor errors may be present, and the methods employed may not necessarily reflect the optimal approach related to the data set.

Author:

Buse Bakış - buse.bakis@metu.edu.tr

The San Francisco Police Department (SFPD) Stop Data was designed to capture information to comply with the Racial and Identity Profiling Act (RIPA), or California Assembly Bill (AB)953. SFPD officers collect specific information on each stop, including elements of the stop, circumstances and the perceived identity characteristics of the individual(s) stopped. The information obtained by officers is reported to the California Department of Justice. This data set includes data on stops starting on July 1st, 2018, which is when the data collection program went into effect. Also, the data set includes information about police stops that occurred, including some details about the person(s) stopped, and what happened during the stop. Each row is a person stopped with a record identifier for the stop and a unique identifier for the person. A single stop may involve multiple people and may produce more than one associated unique identifier for the same record identifier.

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)

library(lubridate)
library(tidyverse)
library(plotly)
library(dplyr)
library(ggplot2)
library(hrbrthemes)
library(viridis)
library(MASS) 
library(reshape2) 
library(reshape)
library(ggmap)
library(leaflet)
library(stringr)
library(wordcloud)

stop <- read.csv("stopdata.csv", sep = ",")

stop$date <- ymd_hms(stop$stop_datetime)

stop$doj_record_id <- as.factor(stop$doj_record_id)

stop <- stop |>
  group_by(unique_identifier) |>
  mutate(count = n())

Introduction

In this project to investigate police stop data and traffic accident that caused a person injury in San Francisco data is used. One of the data sets is between in 2018-2023 (Police Stop Data), and the other one is between in 2005-2023 (Accident data). However, to be able to work in the same period, it was selected as 2018-2023.

There is a 244934 observation and 87 variable in Police Stop data, and 57456 observation and 57 variable in Accident data.

Firstly, the structure of the Police Stop data was investigated. Only numeric variables are numeric and others even if date variable was also character variable, date variable changed as date with the function from lubridate package, and the monthly trends of number of stops is check with the interactive plotly line plot.

There was a huge dramatic decrease in February 2020, probably pandemic situation is affected the rate of the stop people to be safe from Covid with less interaction. May be police officer just give the penalty tickets without stopping.

stop_monthly <- stop |>
    group_by(date = lubridate::floor_date(date, 'month')) |>
    summarize(count = sum(count)) 

fig <- plot_ly(stop_monthly, x = ~date, y = ~count, type = 'scatter', mode = 'lines') |>
    layout(title = 'Trend of number of people stoped by police 2018-2023', plot_bgcolor = "#e5ecf6", xaxis = list(title = 'Date'), 
         yaxis = list(title = 'Count of stop'))
fig

This data set comprises numerous records of police stops, yet some of them have undergone status changes, specifically being marked as “deleted.” For this project, only data with the status “Completed - Successful Submission” has been included.

stop_age <- stop |>
    group_by(gender = as.factor(perceived_gender), age = perceived_age) |>
    summarize(count = sum(count))
`summarise()` has grouped output by 'gender'. You can override using the
`.groups` argument.
ggplot(stop_age, aes(x = age  , y = count, colour = gender )) +
  geom_line()
Warning: Removed 1 row containing missing values (`geom_line()`).

stop <- stop |> 
  filter(perceived_age < 80 && perceived_age > 16) |> 
  filter(stop_data_record_status == "Completed - Successful Submission")

factor_stop <- as.data.frame(unclass(stop), stringsAsFactors = TRUE)

stop_age2 <- stop |>
    group_by(gender = as.factor(perceived_gender), age = as.factor(perceived_age_group)) |>
    summarize(count =n()) |>
    filter(gender %in% c("Male","Female"))
`summarise()` has grouped output by 'gender'. You can override using the
`.groups` argument.
ggplot(stop_age2, aes(fill=gender, y=count, x=age)) + 
    geom_bar(position="dodge", stat="identity") +
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("Gender vs Age Groups") +
    facet_wrap(~gender) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    xlab("")

Demographic Information

Certain demographic information has been examined, starting with the analysis of gender and age through a line plot. The most of them are male and their ages is in between 25 to 50. The most prevalent gender, among others, is female within the same age group. A minority group of transgender individuals is present in the data set. As evident from the plot, outliers in age are apparent. In California, the minimum age for obtaining a driver’s license is 16.5, yet the data starts from 1. Additionally, individuals above the age of 80 should not be eligible to drive, but there are values close to 125. To address this, the data has been filtered to include ages between 16 and 80.

summary(factor_stop$perceived_race_ethnicity)
                        Asian        Black/African American 
                        27041                         59846 
           Hispanic/Latino(a) Middle Eastern or South Asian 
                        47937                         16152 
                 Multi-racial               Native American 
                         5989                           356 
             Pacific Islander                         White 
                         2959                         84654 
summary(factor_stop$perceived_age_group)
   18 - 29    30 - 39    40 - 49    50 - 59 60 or over   Under 18 
     64295      78355      51320      33524      16673        767 
stop_eth <- stop |>
    group_by(age = perceived_age_group, 
             ethnic = perceived_race_ethnicity) |>
    summarize(count = sum(count))
`summarise()` has grouped output by 'age'. You can override using the `.groups`
argument.
levels(as.factor(stop_eth$ethnic))
[1] "Asian"                         "Black/African American"       
[3] "Hispanic/Latino(a)"            "Middle Eastern or South Asian"
[5] "Multi-racial"                  "Native American"              
[7] "Pacific Islander"              "White"                        
stop_eth <- stop_eth |> 
  mutate(count = case_when(ethnic == "Asian" ~ count*0.17,
                           ethnic == "Middle Eastern or South Asian"~ count*0.17,
                           ethnic == "Hispanic/Latino(a)"~ count*0.15,
                           ethnic == "White"~ count*0.43, 
                           ethnic == "Black/African American"~ count*0.052,
                           ethnic == "Multi-racial"~ count*0.08, 
                           ethnic == "Native American"~ count*0.005,
                           ethnic == "Pacific Islander"~ count*0.003)
         )

ggplot(data=stop_eth, aes(x=age, y=count, fill=ethnic)) + 
  geom_bar(stat="identity") +
  scale_fill_brewer(palette="Blues")

After checking age and gender values, ethnicity of person is investigated. Across all age groups, individuals identified as white are most frequently stopped by police officers, which aligns with the fact that almost half of the population consists of white individuals. Interestingly, despite Hispanics making up 15% of the population, they exhibit a higher rate of police stops compared to other ethnicities.

Accident Data vs Police Stop Data Trend

To check if there is any connection between the stop rate and crash rate, we used Traffic Accident data collected from the San Francisco government open data library, and we adjusted the date values similar to the corrections made in the Police Stop data. Even if crash rate is less than the police stop rate, it has similar trend with the stop rate. At the beginning of pandemic, the crash rate sharply decreased but after a while it started increase again. Perhaps this trend is influenced by seasonality, as there is a noticeable decrease at the beginning of each year.

crash <- read.csv("crashes.csv", sep = ",")
head(crash)
  unique_id cnn_intrsctn_fkey cnn_sgmt_fkey case_id_pkey tb_latitude
1      4734          21703000       9150000      2008161    37.73103
2     44875          27577000       3578000      2238762    37.78552
3     39944          27694000      12281101      2116991    37.75097
4     10509          25668000       6921000      2183936    37.75558
5     18041          25876000       4008000      2382577    37.76849
6     44932          27567000       3937000      2409306    37.78735
  tb_longitude  geocode_source geocode_location     collision_datetime
1    -122.4290 SFPD-CROSSROADS      CITY STREET 04/09/2005 02:03:00 AM
2    -122.4604 SFPD-CROSSROADS      CITY STREET 09/06/2005 10:30:00 AM
3    -122.4950 SFPD-CROSSROADS      CITY STREET 06/23/2005 02:30:00 PM
4    -122.4296 SFPD-CROSSROADS      CITY STREET 08/10/2005 03:26:00 PM
5    -122.4290 SFPD-CROSSROADS      CITY STREET 11/03/2005 02:21:00 PM
6    -122.4570 SFPD-CROSSROADS      CITY STREET 12/26/2005 08:03:00 AM
     collision_date collision_time accident_year     month day_of_week
1     2005 April 09       02:03:00          2005     April    Saturday
2 2005 September 06       10:30:00          2005 September     Tuesday
3      2005 June 23       14:30:00          2005      June    Thursday
4    2005 August 10       15:26:00          2005    August   Wednesday
5  2005 November 03       14:21:00          2005  November    Thursday
6  2005 December 26       08:03:00          2005  December      Monday
             time_cat juris officer_id reporting_district beat_number
1  2:01 am to 6:00 am  3801       4216                            004
2 10:01 am to 2:00 pm  3801        625                  G        4B4G
3  2:01 pm to 6:00 pm  3801       1629                            3I3
4  2:01 pm to 6:00 pm  3801     002122                  D        4B6D
5  2:01 pm to 6:00 pm  3801       1105                           3E61
6 6:01 am to 10:00 am  3801        334              RICHM         3G2
     primary_rd  secondary_rd distance direction weather_1  weather_2
1    MISSION ST   TRUMBULL ST      132     North    Cloudy    Raining
2 CALIFORNIA ST      02ND AVE       37      West    Cloudy Not Stated
3   SUNSET BLVD     ORTEGA ST      211     South    Cloudy Not Stated
4       HILL ST    SANCHEZ ST      133      East     Clear Not Stated
5     CHURCH ST  RESERVOIR ST       66     South     Clear Not Stated
6    JORDAN AVE CALIFORNIA ST      500     North     Clear Not Stated
          collision_severity type_of_collision                 mviw
1 Injury (Complaint of Pain)        Hit Object         Fixed Object
2            Injury (Severe)             Other              Bicycle
3 Injury (Complaint of Pain)        Hit Object         Fixed Object
4     Injury (Other Visible)         Broadside Parked Motor Vehicle
5     Injury (Other Visible)        Overturned         Fixed Object
6 Injury (Complaint of Pain)          Rear End Parked Motor Vehicle
                   ped_action road_surface          road_cond_1 road_cond_2
1      No Pedestrian Involved          Wet No Unusual Condition  Not Stated
2      No Pedestrian Involved          Dry No Unusual Condition  Not Stated
3      No Pedestrian Involved          Dry No Unusual Condition  Not Stated
4 In Road, Including Shoulder          Dry No Unusual Condition  Not Stated
5      No Pedestrian Involved          Dry     Holes, Deep Ruts  Not Stated
6      No Pedestrian Involved          Wet No Unusual Condition  Not Stated
              lighting control_device    intersection vz_pcf_code vz_pcf_group
1 Dark - Street Lights           None Midblock > 20ft     Unknown      Unknown
2             Daylight           None Midblock > 20ft       22517        22517
3             Daylight    Functioning Midblock > 20ft     Unknown      Unknown
4             Daylight           None Midblock > 20ft     Unknown      Unknown
5             Daylight           None Midblock > 20ft     Unknown      Unknown
6             Daylight           None Midblock > 20ft     Unknown      Unknown
                        vz_pcf_description
1                                  Unknown
2 Opening door on traffic side when unsafe
3                                  Unknown
4                                  Unknown
5                                  Unknown
6                                  Unknown
                                                                                      vz_pcf_link
1                                                                                      Not Stated
2 http://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=VEH&sectionNum=22517
3                                                                                      Not Stated
4                                                                                      Not Stated
5                                                                                      Not Stated
6                                                                                      Not Stated
  number_killed number_injured
1             0              4
2             0              1
3             0              1
4             0              1
5             0              1
6             0              1
                                                                      street_view
1 https://maps.google.com/maps?q=&layer=c&cbll=37.7310277112579,-122.429020773456
2   https://maps.google.com/maps?q=&layer=c&cbll=37.785522231882,-122.46043606478
3  https://maps.google.com/maps?q=&layer=c&cbll=37.750971046041,-122.494976107694
4 https://maps.google.com/maps?q=&layer=c&cbll=37.7555838090639,-122.429578947769
5 https://maps.google.com/maps?q=&layer=c&cbll=37.7684931481079,-122.429037469419
6 https://maps.google.com/maps?q=&layer=c&cbll=37.7873546399109,-122.456963340925
  dph_col_grp  dph_col_grp_description party_at_fault party1_type
1          AA Vehicle(s) Only Involved             NA      Driver
2          CC          Vehicle-Bicycle              1      Driver
3          AA Vehicle(s) Only Involved             NA      Driver
4          BB       Vehicle-Pedestrian              1      Driver
5          FF             Bicycle Only             NA   Bicyclist
6          AA Vehicle(s) Only Involved              1      Driver
  party1_dir_of_travel  party1_move_pre_acc    party2_type party2_dir_of_travel
1           Not Stated  Proceeding Straight                                    
2                 East               Parked      Bicyclist                 East
3                North         Ran Off Road                                    
4                 East Other Unsafe Turning Parked Vehicle           Not Stated
5                South  Proceeding Straight                                    
6                North  Proceeding Straight Parked Vehicle                North
  party2_move_pre_acc                               point
1                     POINT (-122.429020773 37.731027711)
2 Proceeding Straight POINT (-122.460436065 37.785522232)
3                     POINT (-122.494976108 37.750971046)
4              Parked POINT (-122.429578948 37.755583809)
5                     POINT (-122.429037469 37.768493148)
6              Parked  POINT (-122.456963341 37.78735464)
              data_as_of        data_updated_at         data_loaded_at
1 04/09/2005 12:00:00 AM 04/26/2023 12:00:00 AM 12/12/2023 12:18:08 PM
2 09/06/2005 12:00:00 AM 04/26/2023 12:00:00 AM 12/12/2023 12:18:08 PM
3 06/23/2005 12:00:00 AM 04/26/2023 12:00:00 AM 12/12/2023 12:18:08 PM
4 08/10/2005 12:00:00 AM 04/26/2023 12:00:00 AM 12/12/2023 12:18:08 PM
5 11/03/2005 12:00:00 AM 04/26/2023 12:00:00 AM 12/12/2023 12:18:08 PM
6 12/26/2005 12:00:00 AM 04/26/2023 12:00:00 AM 12/12/2023 12:18:08 PM
crash$collision_datetime <- sub("[[:space:]].*", "", crash$collision_datetime)

crash$collision_datetime <- as.Date(crash$collision_datetime, "%m/%d/%Y")

crash$date <- as.POSIXct(crash$collision_datetime, format="%Y-%m-%d",tz="UTC")

crash <- crash |>
  group_by(unique_id) |>
  mutate(count = n())

crash_monthly <- crash |>
    group_by(date = lubridate::floor_date(date, 'month')) |>
    summarize(count = sum(count)) 

crash_monthly <- subset(crash_monthly, date >= "2018-07-01" & date <= "2023-06-01")
crash_monthly <- rbind(crash_monthly, c("2023-06-01", 211))

monthly_data <- cbind(crash_monthly, stop_monthly)
monthly_data <- monthly_data[,-3]
colnames(monthly_data) <- c("date", "crash_count", "stop_count")



melt_data <- melt(monthly_data, id = c("date")) 
Warning in melt.data.frame(monthly_data, id = c("date")): partial argument
match of 'id' to 'id.vars'
#melt_data

ggplot(crash_monthly, aes(x = date, y = count)) + 
  geom_line() + 
  scale_color_manual(values = "darkred")

ggplot(melt_data, aes(x = date, y = value)) + 
  geom_line(aes(color = variable, linetype = variable)) + 
  scale_color_manual(values = c("darkred", "steelblue"))

Duration Time according to Demographic Information

When the police officer is stopped the person, is the ethnicity or the age affected the stop duration time? To see it, some interactive box-plots are created:

no_na_df<- stop[!is.na(stop$latitude), ]
df <- no_na_df |>
  group_by(longitude,latitude) |>
  summarise(cnt = n())
`summarise()` has grouped output by 'longitude'. You can override using the
`.groups` argument.
out <- boxplot.stats(no_na_df$duration_of_stop)$out
out_ind <- which(no_na_df$duration_of_stop %in% c(out))
#out_ind
df2 <- no_na_df[-out_ind, ]


plot_ly(
  data = df2,
  x = ~perceived_race_ethnicity,
  y = ~duration_of_stop,
  type = "box",
  color = ~perceived_race_ethnicity,
  showlegend = FALSE) %>%
  layout(xaxis = list(title = 'Ethnicity'), 
  yaxis = list(title = 'Duration time (minute)' ) 
  )

At the plot above, it can be seen that African American people have the highest duration of stop rate, but in Asians (normal and middle/south eastern) there is an interesting situation. Although their average duration is shorter from many other ethnicity but they includes many outliers in higher minutes.

plot_ly(
  data = df2,
  x = ~perceived_age_group,
  y = ~duration_of_stop,
  type = "box",
  color = ~perceived_age_group,
  showlegend = FALSE) %>%
  layout(xaxis = list(title = 'Age Group'), 
  yaxis = list(title = 'Duration time (minute)' ) 
  )

Age groups are checked according to duration time. As it is expected, 60 or over people duration time is shorter than the other age groups, but under 18 age group have the highest average duration rate and the longest ranges. However, other age groups duration minute rate are almost exactly the same.

Stop Points vs Crash Points

Police stop points are generally in the same spots. In the Police Stop data set there was some geospatial values for stopping spot as longitude and latitude information. To see this spot in the map longitude and latitude values are grouped and counted as how many people stopped there. The most of the stop points had just 1 stopped. However to see the most popular stopping places having more than 100 stop places are selected. In the one place, the police officer stopped 2569 different people. To see the crashes in the map firstly longitude and latitude values created from point variable because it collected weird way as POINT (…) (What?). To create the geospatial information from this variable with the StringR packages, some string functions deleted unnecessary characters, then separated variable into two different variables as longitude and latitude.

summary(df$cnt)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    3.00   13.62   10.00 2542.00 
mybins <- seq(100, 2569, by=400)
mypalette <- colorBin( palette="YlOrBr", domain=quakes$mag, na.color="transparent", bins=mybins)

m <- leaflet(df) %>% 
  addTiles()  %>% 
  setView( lat=37.773972, lng=-122.431297, zoom=11) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(~longitude, ~latitude, 
    fillColor = ~mypalette(cnt), fillOpacity = 0.7, color="white", radius=8, stroke=FALSE,
    labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"), textsize = "13px", direction = "auto")
  ) %>%
  addLegend(pal=mypalette, values=~cnt, opacity=0.5, title = "count", position = "bottomright" )
Warning in mypalette(cnt): Some values were outside the color scale and will be
treated as NA
head(crash$point)
[1] "POINT (-122.429020773 37.731027711)" "POINT (-122.460436065 37.785522232)"
[3] "POINT (-122.494976108 37.750971046)" "POINT (-122.429578948 37.755583809)"
[5] "POINT (-122.429037469 37.768493148)" "POINT (-122.456963341 37.78735464)" 
crash$point<-gsub("POINT ","",as.character(crash$point))
crash$point<-gsub(")","",as.character(crash$point))
crash$point<-gsub("^.{0,1}","",as.character(crash$point))

df1 <- crash

df1[c('lon', 'lat')] <- str_split_fixed(crash$point, ' ', 2)
df1$lon <- as.numeric(df1$lon)
df1$lat <- as.numeric(df1$lat)

df1 <- df1 |>
  group_by(lon,lat) |>
  summarise(cnt = n())
`summarise()` has grouped output by 'lon'. You can override using the `.groups`
argument.
summary(df1)
      lon              lat             cnt         
 Min.   :-122.5   Min.   :37.71   Min.   :  1.000  
 1st Qu.:-122.4   1st Qu.:37.75   1st Qu.:  1.000  
 Median :-122.4   Median :37.77   Median :  1.000  
 Mean   :-122.4   Mean   :37.76   Mean   :  1.837  
 3rd Qu.:-122.4   3rd Qu.:37.78   3rd Qu.:  1.000  
 Max.   :-122.4   Max.   :37.83   Max.   :197.000  
 NA's   :1        NA's   :1                        
mybins1 <- seq(5, 197, by=30)
mypalette1 <- colorBin( palette="YlOrBr", domain=df1$cnt, na.color="transparent", bins=mybins1)

m1 <- leaflet(df1) %>% 
  addTiles()  %>% 
  setView(lat=37.773972, lng=-122.431297, zoom=11) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(~lon, ~lat, 
    fillColor = ~mypalette1(cnt), fillOpacity = 0.7, color="white", radius=8, stroke=FALSE,
    labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"), textsize = "13px", direction = "auto")
  ) %>%
  addLegend(pal=mypalette1, values=~cnt, opacity=0.5, title = "count", position = "bottomright" )
Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with either
missing or invalid lat/lon values and will be ignored
Warning in mypalette1(cnt): Some values were outside the color scale and will
be treated as NA
par(mfrow=c(1,2))
m
m1

After mapping the longitude and latitude variables, there were no surprising results; the upper east side of the city emerged as the most popular location for both crashes and police stops. This area witnessed the highest number of crashes, suggesting that police officers may also be concentrated there to prevent accidents.

Districts and Actions

In San Francisco, there is many districts. However, in this data set, district levels are more than the number of districts. Because some of district wrote as lower case, others are upper case and some of them as N/A. Firstly, N/A’s are dropped from data and others are converted as upper case. After these steps, the district information grouped by and counted and visualized with the word cloud plot. So, the most stop occurred in Southern, Mission and Central of San Francisco. Also, action that have taken is investigated in this project, but also this variable is collected with the weird way, the police officer wrote different action that have taken in one individual with “|” as separator. For dividing the sentences, we used the seperate_delim_longer function. The separated sentences as different value but it leaves white spaces in the end and beginning of the sentences, and this white spaces deleted with str_trim function. Therefore, all the actions taken became a level, and visualized it also word cloud plot. Therefore, it can be seen as the most taken action is “Patrol car detention”.

area <- dplyr::select(stop, c("city","district","unique_identifier", "actions_taken"))
area$city <- as.factor(area$city)
area$district <- as.factor(area$district)

head(area)
# A tibble: 6 × 4
# Groups:   unique_identifier [6]
  city          district        unique_identifier      actions_taken            
  <fct>         <fct>           <chr>                  <chr>                    
1 SAN FRANCISCO TARAVAL         U380119326E22D6100AC_5 Search of person was con…
2 SAN FRANCISCO OUT OF SF / UNK U3801223400D4887116B_1 Property was seized | Ha…
3 SAN FRANCISCO TENDERLOIN      U380121054B6A404E1D6_1 Search of person was con…
4 SAN FRANCISCO BAYVIEW         U380118193C7D32D472B_1 Search of person was con…
5 SAN FRANCISCO BAYVIEW         U38012029453C383DF7A_1 Search of person was con…
6 SAN FRANCISCO OUT OF SF / UNK U3801210484E3C91B19E_1 Search of property was c…
sum(is.na(area$actions_taken))
[1] 0
long <- area %>% separate_longer_delim(actions_taken, "|")

long$actions_taken <- str_trim(long$actions_taken, side = c("both")) #if there is a white space it will delete
long$actions_taken <- as.factor(long$actions_taken)

#levels(long$actions_taken)
#levels(long$district)
long$district <- toupper(long$district)
long$district <- as.factor(long$district)

long1 <- subset(long, district =! "#N/A" )
Warning: In subset.data.frame(long, district = !"#N/A") :
 extra argument 'district' will be disregarded
#levels(long1$district)

df3 <- long |>
  group_by(district, actions_taken) |>
  summarise(count = n()) |>
  arrange(desc(count))
`summarise()` has grouped output by 'district'. You can override using the
`.groups` argument.
df4 <- subset(df3, actions_taken != "None")

df3 <- df3 |>
  group_by(district) |>
  summarise(count=sum(count))

df4 <- df4 |>
  group_by(actions_taken) |>
  summarise(count=sum(count))

df3 %>% with(wordcloud(district, count, max.words = 30, random.order = FALSE, rot.per = 0.35, 
    colors = brewer.pal(8, "Dark2")))

df4 %>% with(wordcloud(actions_taken, count, max.words = 200, random.order = FALSE, rot.per = 0.15, 
    colors = brewer.pal(8, "Dark2")))
Warning in wordcloud(actions_taken, count, max.words = 200, random.order =
FALSE, : Curbside detention could not be fit on page. It will not be plotted.
Warning in wordcloud(actions_taken, count, max.words = 200, random.order =
FALSE, : Search of person was conducted could not be fit on page. It will not
be plotted.
Warning in wordcloud(actions_taken, count, max.words = 200, random.order =
FALSE, : Handcuffed or flex cuffed could not be fit on page. It will not be
plotted.
Warning in wordcloud(actions_taken, count, max.words = 200, random.order =
FALSE, : Search of property was conducted could not be fit on page. It will not
be plotted.

References

San Francisco, California Population 2024. Worldpopulationreview. https://worldpopulationreview.com/us-cities/san-francisco-ca-population

Police Department Stop Data. DataSF. https://data.sfgov.org/Public-Safety/Police-Department-Stop-Data/ubqf-aqzw/about_data